Search CORE

26 research outputs found

Using the Output Embedding to Improve Language Models

Author: Press Ofir
Wolf Lior
Publication venue
Publication date: 01/01/2017
Field of study

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.Comment: To appear in EACL 201

arXiv.org e-Print Archive

Crossref

How Language Model Hallucinations Can Snowball

Author: Liu Alisa
Merrill William
Press Ofir
Smith Noah A.
Zhang Muru
Publication venue
Publication date: 22/05/2023
Field of study

A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we hypothesize that in some cases, when justifying previously generated hallucinations, LMs output false claims that they can separately recognize as incorrect. We construct three question-answering datasets where ChatGPT and GPT-4 often state an incorrect answer and offer an explanation with at least one incorrect claim. Crucially, we find that ChatGPT and GPT-4 can identify 67% and 87% of their own mistakes, respectively. We refer to this phenomenon as hallucination snowballing: an LM over-commits to early mistakes, leading to more mistakes that it otherwise would not make

arXiv.org e-Print Archive

Measuring and Narrowing the Compositionality Gap in Language Models

Author: Lewis Mike
Min Sewon
Press Ofir
Schmidt Ludwig
Smith Noah A.
Zhang Muru
Publication venue
Publication date: 07/10/2022
Field of study

We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This surprising result suggests that while more powerful models memorize and recall more factual knowledge, they show no corresponding improvement in their ability to perform this kind of compositional reasoning. We then demonstrate how elicitive prompting (such as chain of thought) narrows the compositionality gap by reasoning explicitly instead of implicitly. We present a new method, self-ask, that further improves on chain of thought. In our method, the model explicitly asks itself (and then answers) follow-up questions before answering the initial question. We finally show that self-ask's structured prompting lets us easily plug in a search engine to answer the follow-up questions, which additionally improves accuracy

arXiv.org e-Print Archive

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Author: Jimenez Carlos E.
Narasimhan Karthik
Pei Kexin
Press Ofir
Wettig Alexander
Yang John
Yao Shunyu
Publication venue
Publication date: 10/10/2023
Field of study

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We consider real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. We therefore introduce SWE-bench, an evaluation framework including

2,294

software engineering problems drawn from real GitHub issues and corresponding pull requests across

12

popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. Claude 2 and GPT-4 solve a mere

4.8

% and

1.7

% of instances respectively, even when provided with an oracle retriever. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.Comment: Data, code, and leaderboard are available at https://www.swebench.co

arXiv.org e-Print Archive

Architecture of Planetary Systems Based on Kepler Data: Number of Planets and Coplanarity

Author: Batalha
Batalha
Borucki
Borucki
Brown
Christiansen
Colón
Fabrycky
Fressin
Good
Hoel
Howard
Jean-Luc Margot
Johansen
Julia Fang
Konacki
Lissauer
Lissauer
McArthur
Moorhead
Morton
Ofir
Press
Stuart
Tremaine
Weissbein
Winn
Youdin
Publication venue: 'IOP Publishing'
Publication date: 30/10/2012
Field of study

We investigated the underlying architecture of planetary systems by deriving the distribution of planet multiplicity (number of planets) and the distribution of orbital inclinations based on the sample of planet candidates discovered by the Kepler mission. The scope of our study included solar-like stars and planets with orbital periods less than 200 days and with radii between 1.5 and 30 Earth radii, and was based on Kepler planet candidates detected during Quarters 1 through 6. We created models of planetary systems with different distributions of planet multiplicity and inclinations, simulated observations of these systems by Kepler, and compared the properties of the transits of detectable objects to actual Kepler planet detections. Specifically, we compared with both the Kepler sample's transit numbers and normalized transit duration ratios in order to determine each model's goodness-of-fit. We did not include any constraints from radial velocity surveys. Based on our best-fit models, 75-80% of planetary systems have 1 or 2 planets with orbital periods less than 200 days. In addition, over 85% of planets have orbital inclinations less than 3 degrees (relative to a common reference plane). This high degree of coplanarity is comparable to that seen in our Solar System. These results have implications for planet formation and evolution theories. Low inclinations are consistent with planets forming in a protoplanetary disk, followed by evolution without significant and lasting perturbations from other bodies capable of increasing inclinations.Comment: 16 pages, 7 figures, accepted to Ap

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

The multi-configurational time-dependent Hartree method for bosons: Many-body dynamics of bosonic systems

Author: Alexej I. Streltsov
C. J. Pethick
J. Frenkel
J. Zanghellini
L. P. Pitaevskii
L. P. Pitaevskii
L. Pitaevskii
Lorenz S. Cederbaum
Ofir E. Alon
P. Kramer
R. Kosloff
W. H. Press
Publication venue: 'American Physical Society (APS)'
Publication date: 08/03/2007
Field of study

The evolution of Bose-Einstein condensates is amply described by the time-dependent Gross-Pitaevskii mean-field theory which assumes all bosons to reside in a single time-dependent one-particle state throughout the propagation process. In this work, we go beyond mean-field and develop an essentially-exact many-body theory for the propagation of the time-dependent Schr\"odinger equation of

N

interacting identical bosons. In our theory, the time-dependent many-boson wavefunction is written as a sum of permanents assembled from orthogonal one-particle functions, or orbitals, where {\it both} the expansion coefficients {\it and} the permanents (orbitals) themselves are {\it time-dependent} and fully determined according to a standard time-dependent variational principle. By employing either the usual Lagrangian formulation or the Dirac-Frenkel variational principle we arrive at two sets of coupled equations-of-motion, one for the orbitals and one for the expansion coefficients. The first set comprises of first-order differential equations in time and non-linear integro-differential equations in position space, whereas the second set consists of first-order differential equations with time-dependent coefficients. We call our theory multi-configurational time-dependent Hartree for bosons, or MCTDHB(

M

), where

M

specifies the number of time-dependent orbitals used to construct the permanents. Numerical implementation of the theory is reported and illustrative numerical examples of many-body dynamics of trapped Bose-Einstein condensates are provided and discussed.Comment: 30 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Time-dependent multi-orbital mean-field for fragmented Bose-Einstein condensates

Author: Alexej I. Streltsov
Alon
Alon
Alon
Ananikian
Baer
Bačić
Carr
Cederbaum
Cederbaum
Kanamoto
Komineas
Kramer
Kull
Leggett
Liang
Lorenz S. Cederbaum
Masiello
Meyer
Mueller
Ofir E. Alon
Pitaevskii
Press
Saito
Sanchez-Palencia
Streltsov
Streltsov
Streltsov
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

The evolution of Bose-Einstein condensates is usually described by the famous time-dependent Gross-Pitaevskii equation, which assumes all bosons to reside in a single time-dependent orbital. In the present work we address the evolution of fragmented condensates, for which two (or more) orbitals are occupied, and derive a corresponding time-dependent multi-orbital mean-field theory. We call our theory TDMF(

n

), where

n

stands for the number of evolving fragments. Working equations for a general two-body interaction between the bosons are explicitly presented along with an illustrative numerical example.Comment: 16 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref